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Abstract 

The chemical Fokker-Planck equation and the corresponding chemical Langevin equation are 
commonly used approximations of the chemical master equation. These equations are derived 
from an uncontrolled, second-order truncation of the Kramers- Moyal expansion of the chemical 
master equation and hence their accuracy remains to be clarified. We use the system-size expan- 
sion to show that chemical Fokker-Planck estimates of the mean concentrations and of the variance 
of the concentration fiuctuations about the mean are accurate to order for reaction systems 

which do not obey detailed balance and at least accurate to order 17^^ for systems obeying de- 
tailed balance, where is the characteristic size of the system. Hence the chemical Fokker-Planck 
equation turns out to be more accurate than the linear-noise approximation of the chemical master 
equation (the linear Fokker-Planck equation) which leads to mean concentration estimates accurate 
to order ^2"^/^ and variance estimates accurate to order n~^^^. This higher accuracy is particularly 
conspicuous for chemical systems realized in small volumes such as biochemical reactions inside 
cells. A formula is also obtained for the approximate size of the relative errors in the concentration 
and variance predictions of the chemical Fokker-Planck equation, where the relative error is defined 
as the difference between the predictions of the chemical Fokker-Planck equation and the master 
equation divided by the prediction of the master equation. For dimerization and enzyme-catalyzed 
reactions, the errors are typically less than few percent even when the steady-state is characterized 
by merely few tens of molecules. 
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I. INTRODUCTION 



Chemical master equations (CMEs) are the accepted mathematical description of chem- 
ical systems in well- mixed conditions These equations provide a mesoscopic description 
of chemical kinetics, interpolating between the microscopic regime of molecular dynamics 
and the macroscopic regime of rate equations (REs). It has been shown that CMEs are 
exact descriptions for any well-stirred and thermally equilibrated gas-phase chemical system 
[2!. More recently it has been rigorously confirmed that their validity extends to chemical 
reactions in well-stirred dilute solutions jsl- However, well before these rigorous demonstra- 
tions of the microscopic physical basis of the CME, scientists have employed these equations 
to probe the nature of mesoscopic chemical kinetics and in particular to understand how 
this may differ from kinetics on macroscopic length scales (see McQuarrie for a review ^ of 
the literature up till 1967). 

We briefly review the CME formalism. Consider a general chemical system consisting of 
a number of distinct chemical species interacting via R elementary chemical reactions of 
the type: 

k ■ 

SijXi + ... + SnjXn rijXi + ... + TnjXn. (1) 

Here j is an index running from 1 to R, Xi denotes chemical species i, Sij and rjj are the 
stoichiometric coefficients and kj is the macroscopic rate of reaction. If this system is well- 
mixed then its mesoscopic state is fully determined by the vector of the absolute number 
of molecules of each species, n = (ni, ...,niy)'^, where is the number of molecules of the 
i^'^species. The CME is then a time-evolution equation for the probability of the system 

5|: 



being in a particular mesoscopic state 



= ^E(n^r''^ - l)kn,n)P{n,t), (2) 

where Q is the volume of the compartment in which the reactions occur and is a step 
operator - when it acts on some function of the absolute number of molecules, it gives 
back the same function but with replaced by + x. The chemical reaction details are 
encapsulated in the stoichiometric matrix S^j = Tij — Sij and in the microscopic rate functions 
/j(n, Vt). The probability that the j*^ reaction occurs in the time interval [t, t + dt) is given 
by Qfj{n,Q)dt. For elementary reactions, the microscopic rate function takes one of four 
different forms, depending on the order of the j*'* reaction: (i) a zeroth-order reaction by 



which a species is input into a compartment gives fjin, Q) = kj; (ii) a first-order unimolecular 
reaction involving the decay of some species h gives fj{n,Q) = kjUhQ^^; (iii) a second- 
order bimolecular reaction between two molecules of the same species h gives fj {n, Q) = 
kjUhiuh — 1)^"^; (iii) a second-order bimolecular reaction between two molecules of different 
species, h and v, gives fj{n,Q) = kjUhn^Q'"^ . 

The RE description of the same system is much simpler. Denoting the macroscopic 
concentration of species i hj (pi, the set of REs describing the macroscopic kinetics of the 
reactive system represented by Eq. (1) are given by: 

^ = f:s.,m, (3) 

where (p = {4>i, ■■■,4>n)'^ is the vector of macroscopic concentrations and fj is the macro- 
scopic rate function of the j^^ reaction which has the general mass-action form, fj{(f)) = 
kj Y[m=i • provide a continuous deterministic "many molecule" description of ki- 
netics. This strongly contrasts with the CME description which constitutes a discrete, 
stochastic, "any number of molecule" description that is faithful to the underlying micro- 
scopic basis of chemical reactions. 

Unfortunately, one of the main advantages of CMEs over their RE cousins, their discrete 
description, is also the source of their computational intractability. Differential-difference 
equations, such as the CME j^, do not lend themselves easily to analysis. In contrast, 
there is a vast body of literature in engineering, mathematics and physics dealing with the 
analysis and solution of differential and partial differential equations. Thus at an early 
stage, considerable effort was invested in obtaining a partial differential approximation of 



the CME. In the 1940's, Kramers jsl and Moyal [7| developed a Taylor series expansion of 
the CME; by assuming that all terms with derivatives greater than two are negligible, one 
obtains the chemical Fokker-Planck equation (CFPE, [8]), a second-order partial differential 
equation of the form: 

j=l ^ i=l i,w=l ' 



As Gardiner mentions in his book [8[, "this procedure enjoyed wide popularity - mainly 
because of the convenience and simplicity of the result" and also because "it is often simpler 
to use the Fokker-Planck equation than the Master equation." A major and important 



difference between the CME and the CFPE is that rii is a positive integer for the CME 
while it is a real number for the CFPE. 

Several authors have questioned the validity of the CFPE approximation. The approxi- 
mation is obtained by a perfunctory truncation of the Taylor expansion and hence it appears 
to be an uncontrolled and unjustified approximation of the CME. van Kampen, in partic- 
ular, was a leading and influential critic of the CFPE approximation. In the 1960's and 
70's, he developed a systematic perturbative expansion of the CME in powers of the inverse 
square root of the system volume fl (the system-size expansion) and used it to show that to 
lowest order in the expansion, i.e. in the limit of large volumes - the macroscopic limit, one 
obtains a Fokker-Planck equation which is of a different form than the CFPE jo], [lo|. Of 
particular concern is that van Kampen's Fokker-Planck equation is linear whereas the CFPE 
is non-linear. Note that by non-linear Fokker-Planck equation here we mean one such that 
its drift and diffusion coefficients are generally non-linear functions of the molecule numbers 
ni] this convention is adopted since it is in mainstream use, for example see the book by van 
Kampen jsj. Taking into account higher-order terms in the system-size expansion does not 
lead to the CFPE as well. However, interestingly, in the limit of large volumes, the CFPE 
does reduce to van Kampen's linear Fokker-Planck equation [8]. This led van Kampen to 
conclude that any features arising from the non-linear character of the CFPE are spurious 



and not to be taken seriously 



We note that the limit of large volumes in van Kampen's 



system-size expansion is taken at fixed macroscopic concentrations and hence it corresponds 
to the limit of large molecule numbers jsl. Hence van Kampen's conclusions can be equiva- 
lently stated as: the CFPE becomes a legitimate approximation of the CME in the limit of 
large molecular populations. 



A few studies at the time 
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13| did suggest that the CFPE's validity extended be- 



yond the linear regime. Of particular importance is a result of Horsthemke and Brenig 
13| which motivated the present study. The authors considered a simple dimerization re- 
action 0— X + X— T-y whereby molecules of a monomer species X are introduced 
in a compartment of volume Q and subsequently they bind to each other to form dimers 
Y. Assuming stationary conditions, the CME and CFPE are solved exactly. It is shown 
that the average concentration of monomers and the variance of the fluctuations from the 
two formalisms agree exactly to order and are respectively equal to + (8^2)"^ and 
(3/4)0 where is the macroscopic concentration obtained by solving the corresponding 
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RE in steady-state conditions. The same example can be found worked in van Kampen's 
book js] wherein he shows that the hnear noise approximation gives mean and variance 
equal to and (3/4)0 Q^^. As we mentioned before, a linearization of the CFPE will lead 
to the linear-noise approximation and hence from this example we can conclude that the 
non-linearity of the CFPE is non-spurious since it leads to a more accurate concentration 
estimate than that which is obtained from the linear-noise approximation. However one 
could argue that this higher accuracy is only particular to the dimerization example and not 
a general feature of the CFPE. Because of this or other reasons, the results of Hortshemke 
and Brenig do not appear to have received the attention they deserved at the time and 
van Kampen's conclusions about the CFPE were accepted, by and large, by the statistical 
physics community. 

Approximately 40 years later after the inception of the system-size expansion, Gillespie 



^ by deriving it without invoking truncation 
14 ]. To be precise, he derived the chemical 



^m{t) = J]s,,/,(r!(t),r]) + ^]V2^^,,y/,■(n(^),fi))^,(t), (5) 



revived the question of the validity of the CFP 
of the Kramers-Moyal expansion of the CME 
Langevin equation (CLE): 

R R 

dt 



3=1 i=i 
where Tj{t) are temporally uncorrelated, independent Gaussian white noises. This stochas- 
tic differential equation is exactly equivalent to the CFPE in the sense that its solution 
generates exact sample paths of the CFPE, Eq. (4). Essentially he showed that the CFPE 
approximation is valid provided two conditions are satisfied. A large number of molecules 
suffices to ensure that both conditions are satisfied however this is NOT a necessary condi- 
tion. This suggests that there are regimes in which the particle numbers may not be very 
large and yet the CFPE may still provide a reasonably good approximation of the CME. 
However Gillespie's derivation does not provide us with a means to estimate the accuracy 
of the CFPE for general chemical systems. 

Questions regarding the validity and accuracy of the CFPE and CLE are more important 
now than ever before. In the past decade, interest has virtually exploded in realistic stochas- 



tic simulations of biochemical reactions inside cells |15l-ll8|. The exact method of sampling 
the trajectories of the CME, the stochastic simulation algorithm 19|, is computationally 
expensive and the CME is analytically intractable; thus approximate methods such as the 
CFPE and the CLE have come to the foreground as an alternative means to obtain numeri- 



cal and theoretical insight into the functioning of intracellular biochemical networks 20l-|24 1 . 



These networks are typically characterized by a large number of bimolecular reactions in 



which at least one of the species is present in very small molecule numbers [16|, |25|, |26| . 
indeed the precise conditions in which the fidelity of the CFPE remains unclear. Hence the 
question of the accuracy of the CFPE has nowadays become a practical one - how much 
can we trust the conclusions derived from the CFPE or the corresponding CLE? 

In this article, we derive formulas to estimate the relative error in the CFPE predictions of 
the mean concentrations and of the variance of the fluctuations about the mean. The results 
are valid for all monostable chemical reaction networks. As a byproduct of our derivation, we 
will also clarify the connection between the CFPE and van Kampen's system-size expansion, 
in particular showing that the non-linear character of the CFPE is not completely spurious 
and that generally CFPE estimates are more accurate than those obtained from the linear 
Fokker-Planck equation. The article is organized as follows. In Section II, we use the 
multivariate system-size expansion to derive expressions for the mean concentrations and 
for the variance of the fluctuations as predicted by the CME accurate to order 0(fi~^). 
In Section III, we develop the system-size expansion of the CFPE and use it to derive 
expressions for the mean concentrations and for the variance of the fluctuations accurate 
to the same order as derived for the CME in Section II. In Section IV, we use the results 
of the previous two sections to derive expressions for the relative error in the predictions 
of the CFPE. We also compare the predictions of the CFPE and the linear Fokker-Planck 
equation. These results are tested on two bimolecular reaction systems - dimerization and 
an enzyme-catalyzed reaction - in Section V. We conclude by a discussion in Section VI. 



II. PERTURBATIVE EXPANSION OF THE CME 

A. The Multivariate System-Size Expansion of the CME 

We will now probe the mesoscopic description provided by the CME using the system-size 
expansion developed by van Kampen This method allows one to derive expressions for 
the mean concentrations and for the variance of the fluctuations about these concentrations, 
as predicted by the CME, accurate to the order of any desired power of the inverse square 
root of the volume. The only requirement for the expansion to hold is that the steady-state 
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of the chemical system is asymptotically stable. For the applications that we are interested 
in, namely biochemical reactions in intracellular conditions, the number of molecules can 
be very small, in some cases just few tens of molecules of a given species per cell. We will 
derive equations accurate to 0(^2^^) - this accuracy should be more than sufficient for the 
applications mentioned since terms of lower order, 0{Q^^), already imply corrections to the 
concentrations of the order of a single molecule in the compartment. To our knowledge this 
is the first time that the system-size expansion has been carried to this order for a general 
system of N interacting chemical species, van Kampen has treated a one species example 
to the same order in his book jsl while Elf and Ehrenberg 27 1 have derived the multivariate 
expansion to 0(^7°). 

The starting point of the system-size expansion is to write the absolute number of 
molecules of species i as: 

^ = 0. + n-'/h., (6) 

where (pi is the macroscopic concentration of species i as determined by the REs. This has 
the effect of transforming all functions of rij in the CME into functions of e^. The expansion 
of the CME proceeds by writing Eg. (2) in terms of the new variables. Details of this 
transformation can be found in |28|; here we will simply state the relevant results and use 
them for our present derivation. The variable change causes the probability distribution 
of molecular populations, P{n,t), to be transformed into the probability distribution of 
fluctuations, n(e', t), where e = (ei, ...,eAr)^. The time derivative, the step operator and the 
microscopic rate function in the CME, read in the new variables: 

dP{n, t) dU{e, t) ^1/2 d4>, dU{e,t) 



dt dt ^ dt dt, 

1=1 



N 



fe=l 

= J2 + cp-' + cp-^'\ (9) 



1=1 fe=l 

2 



fc=0 
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where 

N 



^ i=l ' 



b 



1 / ^ a \ 

M E'»s^ M?). (11) 



N 



,__ly^ ^ (12) 

(13) 



Note that in Eq. (9) the microscopic rate function is expressed in terms of the macroscopic 
rate function. As we shall shortly see, this is convenient from a calculation point of view 
since the final expressions for the means and variances will be solely in terms of functions 
which appear in the REs. Note that the upper limit of the sum in Eq. (9) is 2 because 
all reactions involve at most the interaction of two molecules and hence equals zero for 
k > 2. Although our analysis is specifically for elementary reactions, one can easily extend 
the approach to include "elementary complex" reactions [27]. However we shall not pursue 
this here. 

Substituting Eqs. (7-9) in Eq. (2) we get the following new form of the CME: 
n-'/' Y.^a]b] - a]b] - a]c] - a,^6°)n(e, t) + 



R 



n-'/' ^(a^c,^ ^ ^4^1 _ _ ^3^2 _ «5^0)n(g', t) + 0{Q-'). (14) 

Note that terms proportional to Q^^^ do not appear in the expansion of the CME. This is 
because when one substitutes Eqs. (7-9) in Eq. (2), one equates terms of this order on both 
sides of the CME which simply gives us back the macroscopic REs, Eq. (3). 

To proceed further we need the explicit dependence of the right hand side of Eq. (14) on 
the new variables ej. This is obtained by substituting Eqs. (10-13) in Eq. (14) which leads 
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to: 



+ 



0~^l — (a T\\ _l_ i iwmpp, I TT\ _ ^ jw{2) I q2 tt _ ^ tw / tt\ 

+ —D- u]+Q'^^^i --J^^^^d- (e U) + —J"" d- (e U) - —J'"'' 

1 (2) 1 \ 

Note that in the above equation, we have used the Einstein summation convention where 
all twice repeated indices are understood to be summed over 1 to N . The partial derivative 
j denotes d"^ / dei..dej. We have also used the following two convenient definitions: 

R 

^ij--r = SikSjk.--Srkfk{(p), (16) 



k=l 



From Eq. (3) it follows that Di = d(f)i/dt and consequently Jf represents the i-s element of 
the Jacobian matrix associated with the REs of the system. 

Note that Eq. (15) to order is the linear Fokker-Planck equation which was mentioned 
in the introduction. The drift vector is linear in the e variables while the diffusion tensor is 
independent of them. Both depend on time via their own dependence on the macroscopic 
concentrations. This level of approximation is frequently called the linear-noise approxi- 
mation, a nowadays popular means of estimating the size of the concentration fluctuations 
about the macroscopic concentrations We are interested in the dynamics on mesoscopic 
length scales and hence we shall consider terms of higher order than in Eq. (15). 



B. Time-evolution equations for the moments 

We now proceed to construct equations for the moments of the e variables. We start by 
expanding n(e*, t) as a series in powers of the inverse square root of the volume: 

oo 

n(e,t) = 5^n,(e,t)fi-^/2, (18) 
i=o 
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from which it follows that the moments possess an equivalent expansion: 

oo 

(efce^.-.e^) = ^[eke^...€r]jil~^^^ , (19) 

j=0 

where 

[ekem...er]j = J efcem--er ^j{e,t)de. (20) 

The angled brackets denote the statistical average. Some subtle points associated with the 
perturbative expansion in the probability density and with the physical interpretation of 
[efee^...ej.]j are discussed in Appendix A. The time-evolution equations for the moments are 
obtained as follows. One starts by substituting Eq. (18) in Eq. (15), multiplying the 
resulting equation on both sides by efce^-.-er and integrating over de. Equating terms of 
order fi"^/^ on both sides of the equation gives the time-evolution equation for [ek^m---^r]j- 
Finally one constructs the time-evolution equation for the moments using Eq. (19). 

As mentioned earlier, our aim is to determine the mean concentrations and the variance 
of the fluctuations about the means and hence we must relate the latter to the moments 
of the e variables above. Using Eqs. (6) and (19), one can easily verify that the mean 
concentration of species i and the variance of the fluctuations about it, accurate to order 
Q^^ are respectively given by: 



^ <P^ + ^^''\e,) = 0, + 0-^/2 Y.\.'r]j^~'" + 0{Sl-"% (21) 



3=0 

2 1 2 

= ^^-'(EK'].^^-^'/' - (EN^-^"'^') - ^-%]oH^+0{n-'l'). (22) 

Hence it is clear that to determine the mean and variance accurate to order we shall 
need to determine the first and second moments of the e variables accurate to orders 
and Qr^ respectively. 

We proceed by implementing the calculation recipe outlined just after Eq. (20) to derive 
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equations for the corrections to the second moments accurate to order Q ^: 
d 

— [e,efe]o = Jr[ewek]o + {r ^ k) + Drk, (23) 

+ (r^ A;) + 4-,[eJo, (24) 

^[erekh = J:M2 + ljr[e^e,ek]i - ^ J.'^'V-Ni 
ot II 

- + (r ^ fc) + JZ{eA^ + \jT[^^^m\ - (25) 

Details of the calculations leading to the above equations are illustrated by a step-by-step 
derivation of Eq. (25) in Appendix B. Note that the short-hand notation (r ^ A;) stands 
for all the expressions of the same form as the ones preceding the notation but with r and 
k interchanged. For example in Eq. (24), (r A;) stands for J^[e^„er]i -l- \ Jk^\^w^v^i\Q — 
I J^''^Vui[er]o- This notation will be used throughout the rest of the article since it enables 
the equations to be written in a compact way. 

The equation for [er.efc]o, Eq. (23), is a Lyapunov equation which can be solved either 
analytically (see for example 27|, l30|) or else numerically, for example using the built in 
functions of Matlab and Mathematica. Solution of the equation for [e^efc]!, Eq. (24), requires 
the solutions of the equations for the first and third moments to order ^2°: 

|[e.]o = J^Mo, (26) 

d 

^[erefce,]o = J^le^^efcerlo + (/ ^ A;) + (fc ^ r) 

+ /^wNo + (A;^/) + (r o/). (27) 

Note that in Eq. (27), (/ 4-7- fc) + (fc f4 r) stands for two expressions; the first expression 
corresponds to the first term on the right hand side of Eq. (27) with / and k interchanged 
and the second expression is the first expression just obtained with k and r interchanged. By 
a similar reasoning, it follows that (A; ^ /)-|-(r ^ /) in Eq. (27) stands for Drfc[e«]o + -Difc[er]o- 
Note that in steady-state conditions, [e^Jo = [erCfce^Jo = and and consequently there is no 
correction to the second moments to 0(i7~^), i.e., [erCfc]! = 0. 

Solution of the equation for [erefc]2, Eq. (25), requires the solutions of the corrections to 
the the first and third moments to order and the second and fourth moments to order 
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(28) 



d_ 
di 



+ {l^k) + {k^r) + Dri[ek]i + Jri[ewek]o + {k ^ I) 
+ {r^l) + Drki, 



(29) 



d_ 
di 



[erekeiem]o = Jri^w^keiemlo + {r ^ m) + {m ^ k) + {k ^ I) + -Drm[efee/]o 



+ (m -H- Z) + (Z -H- A;) + (r -H- m) + (m -H- Z) + (A; -H- m). 



(30) 



The procedure to obtain the second moments to order is now clear. One first solves Eq. 
(23) to get [erejk]o; then one solves Eqs. (26-27) and substitutes in Eq. (24) to get [e^efe]!; 
finally one solves Eqs. (28-30) and substitutes these, together with the solution of Eq. (23), 
in Eq. (25) to get [e,.efe]2. 

The first moment equations and the corresponding equations for the mean concentrations 
can be obtained in an analogous manner as for the second moments. The equations for [er]o 
and [€r\i have been already derived, Eqs. (26) and (28), respectively. The equations for [6^)2 
and [crls are given by: 



The procedure to obtain the first moments to order Q~^/^ is now also clear. One first solves 
Eq. (26) to get [e^Jo; then one solves Eq. (23) and substitutes in Eq. (28) to obtain [e,.]i; 
finally one uses the solutions already obtained when deriving the second moments to solve 
Eqs. (31-32) for [srh and [e^jg. 

Given the first and second moments accurate to Jl"^/^ and one finally determines 
the mean concentrations and the variance of the fiuctuations about them accurate to 
from Eqs. (21-22). Although the procedure of obtaining the latter final expressions is fairly 
laborious, as we shall see in the next section, in order to obtain the leading order error in the 
predictions of the CFPE, it will only be necessary for us to solve very few of these equations 
explicitly. 



d_ 

m 

d_ 
di 





(31) 



(32) 
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III. PERTURBATIVE EXPANSION OF THE CFFE 



In this section we use the system-size expansion to derive expressions for the mean con- 
centrations and the fluctuations about them, as predicted by the CFPE, accurate to 0(f2~^). 
To the best of our knowledge this is the first time that the expansion has been used on the 
CFPE although the method is similar in principle to the small-noise expansion of Fokker- 
Planck equations as presented by Gardiner 8|. The CFPE is obtained by truncating the 
Kramer's Moyal expansion to include at most second-order derivatives: 

R / N 



dPi^ = n (n E:'-' - l)f,(n, a)P[n, t) 

j=l ^1=1 / 

R.N . 



-"E(-Es.a;^ + 2 E (33) 

j=l ^ i=l i,w=l ' 

Note that the second step above, follows by Taylor expanding the step operator. 

Next we perform the system-size expansion on the CFPE, Eq. (33), i.e., we make the 
variable transformation given by Eq. (6) which transforms functions of rij into functions 
of the new variables e^. The probability distribution P(n, t) is transformed into a new one 
Tlp{e^^). Note that the subscript F will be used to distinguish quantities calculated using 
the CFPE from those previously calculated using the CME. The time derivative on the left 
hand side of the equation and the microscopic rate function /j(n) transform as in the case 
of the CME and are given by Eqs. (7) and (9) together with the definitions Eqs. (11-13) 
and with n(e', t) replaced by '^p{^^t\ The operators involving derivatives with respect to 
absolute particle number transform as follows: 

Es.£: = f-"^«]. (34) 

1 ^ 

i,w=\ * ^ 

where the operators are as defined in Eq. (10). Hence the CFPE in the new variables 
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reads: 

R 

dt 



R 



R R 

Zl^S'^l + «M - + J] alcjn^ (e; t). (36) 

Note that whereas the transformation given by Eq. (6) on the CME leads to an infinite 
series in powers of the inverse square root of the volume, Eq. (14), the same transformation 
on the CFPE leads to a finite series with the highest order term being of order (this 
is only true for elementary reactions). 

The derivation of the equations for the time evolution of the moments of the e variables 
proceeds in an exactly analogous manner as to that presented in detail in section 11. The 
probability distribution is written as a series in powers of the inverse square root of the 
volume, 

3 

U^{e,t) = J2^i.,jie,t)n-^/'. (37) 
and the moments are then generally given by: 

3 

{€k€jn--er)F = ^[€kem--er]F,j^''^^^ : (38) 

where 

= n.,(F.O*-. (39) 

The equations for the mean concentrations and the variance of the fiuctuations about them 
are given by Eqs. (21-22) with the subscript F carried throughout. The time evolution 
equations for the corrections to the moments can be derived as before. Although there is 
some repetition involved, we will state these equations in full so that the differences between 
them and those derived using the CME are very clear. 
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The equations for the corrections to the second moments accurate to order Q, ^ are: 
d 

-Q^[erek]F,o = Jri^weklFfl + {r^k) + Drk, (40) 
d 11 

-Q^[^r^k]F,l = J^[^w^k]F,l + ■^Jr^[^w^p^k]F,0 — -^Jr^"^^ 4>w[^k]F,0 

+ {r^k) + JZK]f,o, (41) 
d 11 

-Q^[^r^k]F,2 = Jr[^w^k]F,2 + ■^Jr^[^w^p^k]F,l — 2 '^r"''^ V«) [^fc] F,l 

- lj:^%n.ek]F,o + {r^k) + JZie^Ui + ^JTi^^emUo - \jTct>^. (42) 

Note that these are the same as Eqs (23-25) but with subscript F; the imphcit reason for this 
is that only terms containing a] and contribute to the equations for the second moments 
and all such terms are equally present in Eqs. (14) and (36). Note also that Eqs. (23) and 
(40) lead to the same solution, i.e., [e^efelo = [er-eifc]F,o- The solution of [€^6^]^?^! is dependent 
on the solutions of the time evolution equations for [e^J^.o and [er^k^ilpfi- The equations for 
the latter are the same as Eqs. (26-27) but with subscript F\ this is since Eq. (14) and (36) 
are equal to order 0°. It follows that [e^jo = [er]F,o and [erekei]^ = [erek^i\F,o from which we 
can conclude using Eq. (41) that [erefe]i = [er^k\F,\- However, as we now show, generally 

[^r^k]2 7^ [^r^l^F,2- 

The solution of [ere/;]i?^2 is dependent on the solutions of the time evolution equations for 

[er]F,i, [erefce;]^,! and [erekeiej^p^o which are: 

11 

^[CrCfeQlF,! = Jr[^w^k^r]F,l + -^J^^[^w^p^r£k]F,0 — -^Ji (pwi^r^kjFfi 

+ {l^k) + {k^r) + Dri[ek]F,i + JriKek]F,o + {k ^ I) 

+ {r^l), (44) 

d 

-^[^r^k^l^m]F,0 = Jr[^w^k^l^m]F,0 + {r ^ m) + {m ^ k) + {k ^ I) + Drm[ek^l\F,0 

+ {m I) + {I -ir^ k) + {r -H- m) + {m ^ I) + {k -(r^ m). (45) 

Equations (43) and (45) have the same form as Eqs. (28) and (30) respectively. This 
combined with the fact that the right hand sides of Eqs. (43) and (45) are functions of 
[ereklFfi and that [e^e^jo = [erek\F,o, implies that [e^]! = [er\F,i and [e^ejke/e^lo = Kefeeie^J^.o- 
However note that Eq. (44) has one term missing compared to its counterpart Eq. (29) 
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and hence generally [ej.efee;]i 7^ [er^feezlF,! from which it follows using Eq. (42) that [erefe]2 7^ 

The only remaining equations to be considered are those paralleling Eqs. (31) and (32) 
for which we find: 

^[er]F,2 = J:!'K]f,2 + IjrMp,, - ^J:^%^]f,o, (46) 

^[erUs = J^e^cUs + \jrM^,2 - (47) 

By similar arguments to the above, these equations imply [er\2 — ['^r]F,2 and [e^ja 7^ [er]F,3- 
Hence, in summary, we have obtained the following results: 

1- [er]o = [er]F,0, [^r^k]o = [^r^k]F,0 , [^r^kQ]o = [^r^kQ]F,0, [^r^kQ^m]o = [^r^kQ^m]F,0 
2. [€r]l = [er]F,l, [CrCfc]! = [6^6^]^^!, [erefcCijl 7^ [^r^k^ljF,! 

3- [^rh — [^r]F,2, [^r^kh 7^ [er-eifc]F,2 

4- i^rh 7^ [er]F,3 

Note that these results arc not for the moments but for the corrections to the moments; the 
real physical meaning of these results in terms of means and variances will be elucidated in 
the next section. 

Using Eqs. (32) and (47), Eqs (25) and (42) and Eqs. (29) and (44), we can respectively 
write down simple equations for the differences in the corrections to the first, second and 
third moments as predicted by the CFPE and the CME: 

^A, = J-A^ + lj-fA^„ (48) 

d ^ «, 1 «, wp 

"Oj^^rk — Jr ^wk ~l~ <-^fe ^wr ~l~ l^^^r ^ ^wpk ~l~ J]^ ^wpr)i (49) 
^ w w w 

— Arfei = JJ^Au;kr + Jk^wlr + ^wlk + Drkh (50) 

at 

where we have used the convenient definitions: 

A,. = [e^Ja - [er]F,3, (51) 

Arfc = [^r^}^2 — [^r^}^F,2, (52) 
^rkl = [^r^k^l]l — [^r^k^l]F,l- (53) 
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IV. COMPARISON OF THE PREDICTIONS OF THE CFPE AND THE CME 

In this section we will use the results derived in the last section to obtain formulas for 
the absolute and relative errors (to leading order) in the CFPE predictions of the mean 
concentrations and the variance of the fluctuations. Using these formulas we will be able 
to deduce the general conditions in which the differences between the CFPE and the CME 
are minimal. Furthermore we will show that the CFPE is generally more accurate than 
the linear Fokker-Planck equation of van Kampen and that the mean concentrations of the 
CFPE to order are precisely the same as those obtained from Effective Mesoscopic Rate 
Equations. 

A. Estimating the absolute and relative errors in CFPE predictions 

We will now derive expressions for the leading order term of the absolute and relative 
errors made by the CFPE in predicting the mean concentrations and the variance of the 
fluctuations about the mean concentrations. We will also obtain an expression for the 
leading order term of the absolute error made by the CFPE in predicting the skewness of 
the probability distribution of the concentrations. 

The mean concentration predicted by the CME, (rii/il), is given by Eq. (21) while the 
mean concentration predicted by the CFPE, {ni/Q)F is given by the same equation but 
with the subscript F carried throughout. Subtracting the two expressions and using the 
summary of results in Section III together with Eq. (51) we get the absolute error in the 
CFPE concentration: 



Similarly, using Eq. (22) and using the summary of results in Section III together with Eq. 
(52) we find the absolute and relative errors in the variance of the fluctuations to respectively 




(54) 



The relative error follows easily: 




(55) 
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be given by: 

a^-a% = AM-' + 0{n-'/'), (56) 

ELr = = 4^^-' + 0{n-'/% (57) 

where cfLj^j^ is the variance in the concentration of species i as estimated by the hnear- noise 
approximation, i.e., cf^^j^ = Q~^{[ef]Q — [ej]o). Hence the recipe for calculating the errors of 
the CFPE predictions is now complete. One first solves Eqs. (48-50) and then substitutes 
their solution in Eq. (54-57). Note that this calculation recipe is valid for all times and not 
only in steady-state conditions. 

Note also that since the denominator in Eq. (57) is the linear-noise approximation es- 
timate for the variance then the leading relative error term in the variance is proportional 
to In contrast the leading relative error term in the mean concentrations, Eq. (55), 

is proportional to Hence the CFPE's estimates of mean concentrations are generally 

expected to be more accurate than those of the variance of the fluctuations about the mean 
concentrations. 

Finally we obtain the absolute error in the CFPE prediction of the skewness of the 
probability distribution of the concentration of species i. The skewness is defined as: 

-((^-(^»)--' 

The absolute error in the skewness is then Elf,^^ — Si — sp^i where SF,i is the skewness 
predicted by the CFPE, i.e., Eq. (58) with subscript F throughout. As before, by using 
using Eqs. (21-22) together with the summary of results in Section III and Eq. (53) we get: 

ELw-4^^-' + o{n-'/'). (59) 

^i,LNA 

B. The CFPE is more accurate than the linear noise approximation 

We can now answer the question: which of the two, CFPE or linear Fokker-Planck equa- 
tion, is the most accurate? We note that the linear Fokker-Planck equation (or equivalently 
the linear-noise approximation) is obtained by keeping only terms of order Q° in Eq. (14). 
If we do the same on the expansion of the CFPE, i.e. Eq. (36), then we also get the same 
hnear Fokker-Planck equation. This equality imphes that the CFPE becomes correct for 
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large enough volumes or equivalently for large enough molecular populations. This result 
was known to van Kampen and is discussed in the book by Gardiner Q . 

Within the linear-noise approximation, one can calculate the two quantities [e^jo and 
[ere/;]o using Eqs. (26) and (23) respectively. The quantities [er]m and [ere^Jm where m > 
are all zero in this approximation since the expansion has only terms to order Now 
the initial condition for the CME is a delta function centered on the number of molecules 



as given by the REs, i.e. at time t = 0, the average number of molecules of t 



and the REs agree and hence it follows that [e^Jo = initially and for all times js]. These 
results together with Eqs. (21) and (22), would seem to imply that within the linear-noise 
approximation, the mean concentrations are accurate to order fi^^/^ while the variance is 
accurate to order Q"^. However by considering terms of higher order than those leading 
to the linear-noise approximation, one arrives at the conclusion that actually the variance 
within this approximation is accurate to higher order than This can be deduced by 

noting that [e^jo = for all times implies [e^efce^Jo = [erefc]i = also for all times. Hence 
it follows from Eq. (22) that the variance in the linear-noise approximation is accurate to 
order 

Now from Eqs. (54) and (56), it is evident that generally the mean concentration and 
variance prediction of the CFPE are accurate to at least order Hence the mean 

concentration prediction of the CFPE is more accurate than that which can be obtained 
from the linear Fokker-Planck equation. It is also clear that the higher accuracy comes from 
taking into account the non-linear character of the CFPE since the Aj term in Eq. (54) is 
obtained by considering terms in Eqs. (14) and (36) of higher order than the linear-noise 
approximation. 

We can also derive an explicit equation for the mean concentrations predicted by the 
CFPE accurate to order Q^^: 

Note that the first step proceeds by taking the time derivative of Eq. (21) and the second 
step follows from using Eqs. (26) and (43), bearing in mind that [eilpfi = [ei]o. Hence the 
computation of the mean concentrations to this order requires only the solution of the REs 
and of the Lyapunov equation Eq. (23). Note that Eq. (60) is exactly the same as the 
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he CME 



Effective Rate Equations recently derived by Grima from the CME (Eq. 60 is the same as 
Eq. (22) together with Eq. (24) in Ref |28j). 



C. The CFPE is highly accurate for equal-step reactions involving one species 

Consider the case where we have N species interacting via R elementary reactions of the 
equal-step type, i.e., in each individual reaction, either p molecules of a species are generated 
or p molecules are destroyed or no molecules are generated or destroyed. In such a case, 
the stoichiometric matrix elements are Sij = ±p or 0, where p is a non-zero positive integer. 
Three examples of equal- step reactions are: 

^ Xi, A + Xi ^ 2Xi, Xi ^ 

^ 2Xi 

0^Xi,0l^X2,Xi + X2 ^0 (61) 

The first reaction is autocatalytic where A is some very abundant species whose number of 
molecules is considered constant; this is a one-step, one species reaction scheme. The second 
reaction involves the burst input of two molecules and their dimerization, a two-step one 
species reaction scheme. The third reaction involves the production and degradation of two 
species and their bimolecular interaction; this is a one-step, two species reaction scheme. 

For equal-step reactions, one species reaction schemes, the quantity Dm evaluates to 
zero in steady-state conditions: 

R 

R 

= /E^i^/^('^i) = 0. (62) 

i=i 

Note that in the last step, use was made of the steady-state condition: dt(f)i = 
J2f=i ^ijfji'Pi) = 0- From Eqs. (48-50), we can then deduce that Ai = An = Am = 0. 
Hence it follows from Eqs. (54) and (56) that the mean concentrations and the variance of 
fluctuations predicted by the CFPE for one species, equal-step reactions, are accurate to at 
least order This is impressive when one considers that the linear- noise approximation 
of the CME only leads to estimates accurate to order in the mean and order in 
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the variance. These conclusions lend support to the results of an early investigation of the 



one species CFPE 



29|. 



However, this high accuracy of the CFPE is not generally true for multispecies equal-step 
reactions. For example, for the third reaction scheme in the examples considered above, one 
finds Dill = -D222 = and D112 = -D121 = D211 = -D221 = -D212 = -D122 = -^50102 7^ 0. The 
non-zero values of Dijk for some index values implies that the mean and variance predictions 
of the CFPE in this case are accurate to order f]^^/^. 



D. CFPE is highly accurate for multispecies reactions obeying detailed balance 

In the previous subsection we have seen how Dhki is zero for one species, one-step reaction 
schemes and how this leads to a particularly high accuracy in the predictions of the CFPE. 
We now want to find the condition which forces -Dhfci = for chemical reactions involving any 
number of species. Consider the case where all reactions are reversible. Since each reaction 
can be paired with its reverse, it follows that the formula for D^ki can then be written as: 

R 

Dhki = ShjSkjSijfj{(f)) 

R/2 

= Shz+Skz+Slz+fz+{(p) + Shz-Skz-Siz^fz~{4>) 

2 = 1 

R/2 

= ^Shz+Skz+Siz+[fz+{<i)) - fz-{(f)], (63) 
2=1 

where the subscripts + and — indicates quantities evaluated for the forward and backward 
reactions respectively. The reversibility condition imposes Shz+ = —Shz- and was used 
in deriving the last step. Furthermore, a system of reversible reactions will always reach 
chemical equilibrium and in such conditions the system is characterized by detailed balance, 
i.e., fz+UP) = /2-(0)) the forward and reverse rates of each elementary reversible reaction 
balance [30|. Hence by Eq. (63), Dhki = 0, in detailed balance conditions, and consequently 
by Eqs. (48-50) and Eqs. (54) and (56), the CFPE's predictions of mean and variance are 
accurate to order Q^"^. Equilibrium conditions always imply detailed balance and hence our 
results suggest that the size of the differences between the predictions of the CFPE and the 
CME increase with how far is the system from equilibrium. 
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V. APPLICATIONS 



A. Dimerization 

As a first application of our tlieory, we will estimate the relative errors in the CFPE 
predictions for a dimerization reaction. This is the simplest case of a bimolecular reaction 
mechanism. The main purpose of considering such a reaction is that both its CME and 
CFPE are exactly solvable and hence it provides us with a direct test of our expressions for 
the leading order error in the means and the variances as predicted by the CFPE. The set 
of reactions under study are: 

^X, 

X + X J^Y. (64) 

Monomers, denoted as X, are pumped into some compartment at a rate ki. Pairs of 
monomers react with rate constant ^2 to form a dimer molecule, Y. The concentration 
of dimers increases with time however the concentration of monomers becomes constant 
after a short time, i.e. the monomers reach a steady-state. Since Y is not involved in the 
reaction, the mathematical description is solely in terms of the number of molecules of the 
monomers for the CME and CFPE and in terms of the monomer concentration for the RE. 
The CME, Eq. (2), for the dimerization reaction reads: 

dtPin,, t) = kMEi' - l)^(^i> t) + ^{Ef - l)n,{ni - l)P(ni, t). (65) 

Multiplying the equation on both sides by s"^ and summing over n\ from to infinity, we 
get the equivalent generating function equation: 

where F{s,t) = s"^P{ni,t). This partial differential equation is solved in the steady- 
state with boundary conditions F{1) = 1 and F{—1) = [3l| leading to: 

where z = (1 + s)/2, X = VL{ki/2k2Y^'^ and /„ is the modified Bessel function of the first 
kind of order n. The mean concentration and variance of the concentration fluctuations 
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about this mean according to the CME are then given by the following expressions: 



ni 



ds 



s=l 



0l/o(4norfe) 

Ii{4:node) 



(68) 



n 



-2 , 





^ dF{s) 




'dF{s) 






s=l 


ds 


S = l- 



4>l[node[h{4:node)]'^ - ^ode[/o(4node)]^ + /q (4node)/l (4node)] 



(69) 



Note that these expressions are obtained within an exact approach and are not approxima- 
tions as the ones stemming from the system-size expansion of the CME. 

Now we obtain expressions for the mean and variance using the CFPE approach. The 
CFPE, Eq. (33), for the dimerization reaction reads: 



dP{m,t) 



d 



A{m) 



1 



(70) 



dt \ drii ^ ' 2dn{ 

where A = kiQ — 2k2ni{ni — 1)/^ and B = kiQ + 4A;2^i(^i — The exact stationary 

solution of this non-linear second order partial differential equation can be shown to be: 

3kifl^ arctan H{ni) 



exp 



P{ni) 



-rii + 



Ak2{ni - l)ni + ki9? 



K1 + K2 



di] exp 



SkiQ"^ arctan H^t]) 



+ r] 



(71) 



where Fl{x) = \^{2x — 1)/V^i^^ ~ ^2- The constants Ki and K2 are to be determined by 
the boundary conditions and the normalization condition. The boundary conditions of the 
CFPE are P{ni = ±00) = 0. Note that the CFPE unlike the CME does not generally have a 
natural boundary at tt-i = since the noise can sometimes drive the system to negative values 
of rii 32|. Note that this problem is also implicit in the stationary solution of the linear 
Fokker-Planck equation, a Gaussian which is non-zero for negative particle numbers js] (see 
the end of this subsection for a further discussion of boundary conditions). The condition 
at —00 fixes the value of K2 while the condition at 00 is automatically satisfied by the 
exponential pre-factor. The remaining constant Ki is fixed by the normalization condition. 
Since there is no closed form solution for the integral in Eq. (71), Ki has to be computed 
numerically; once P{ni) is determined, the mean and variance can be straightforwardly 
numerically computed as well. 

The exact relative error in the mean and variance predictions of the CFPE can now 
be computed. One first fixes the rate constants ki and /c2 and Uode- The normalization 
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constant Ki is found by numerical integration and from the ensuing steady-state probability 
distribution, one finds the mean, (rii) /D,cfpe , and variance (tI^cfpe- The numerical error in 
the integration is essentially eliminated by performing the integration for a set of decreasing 
step size values and extrapolating to obtain the integral value at zero step size. Using the 
same values of rate constants and node, one uses Eqs. (68-69) to compute the mean and 
variance according to the CME. The exact relative errors in the mean and variance can then 
be found using 1 — {{rii) /Qcfpe) / {{ni) /^) and 1 — cfI^cfpeI ^i^ respectively. The exact 
absolute values of the relative errors in the CFPE predictions are shown by the red open 
circles in Fig. 1 for parameter values ki — 1 and k2 — 2. Note that the relative error in 
the variance is larger than that in the mean. The errors increase with decreasing steady- 
state numbers of monomers. Even for very small numbers, the errors arc quite small. For 
example for a case in which the REs predict 5 monomers in steady-state, the percentage 
relative errors in the mean and variance predictions of the CFPE are just 0.5% and 6.5% 
respectively. The high accuracy of the CFPE in low particle number conditions is indeed 
surprising since typically it has only been deemed accurate for systems characterized by 
large particle numbers. 

Wc can now test the accuracy of the theory developed in the previous sections by using it 
to obtain expressions for the approximate relative errors in the mean and variance and then 
compare these with the exact values as already obtained above. By inspection of the reaction 
scheme, Eq. (64), it can be easily deduced that the stoichiometric matrix is 5" = (1, —2). 
From the definition of the macroscopic rate function vector (see Introduction) it also follows 
that it is equal to /(0i) = {ki,k2(f)l). This is all the information needed to calculate the 
estimates for the relative errors using our theory. The macroscopic concentration and the 
relevant entries of the D and J matrices evaluated at steady-state are then given by: 

^^=(^Y\ (72) 



\2A;2/ 

2 2 

Du = ^(-51,) V.(0i) = h + 4A;20?, An = $^(5i,)V.(</>i) = -^i - ^hc^l, (73) 
i=i i=i 
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These are substituted in Eqs. (48-50) which are then evaluated at steady-state, leading to: 

Am = ~ = -^u (75) 
An ^ 4. (76) 

The relative error in the mean concentration to leading order is then given by Eq. (55): 

1 _ 1 

where node = ^4>i is the average number of monomers as predicted by the REs. To compute 
the relative error in the variance we need to first estimate the variance to the linear-noise 
level of approximation. This is done by solving Eq. (23) in steady-state: 

The variance is then (jf^iNA = ^^^[^ilo- Using the latter and Eq. (76), it is found that Eq. 
(57) evaluates to: 

The theoretical absolute values of the relative errors in the CFPE predictions, as given 
by Eq. (78) and Eq. (80), are shown by the solid blue lines in Fig. 1 for parameter values 
ki = 1 and k2 = 2. The theory is generally in very good agreement with the exact solution; 
small discrepancies are only apparent in the error for the variance at molecule numbers 
less than approximately 5 monomers. The comparison has also been done for many other 
parameters values and as predicted by theory, in all cases, the graphs are the same as shown 
in Fig. 1. 

We have also computed the exact errors by solving the CFPE with different boundary 
conditions. One could argue that constraints should be imposed on the CFPE such that it 
preserves the natural boundary of the CME at ni = 0. This can be fulfilled by requiring that 
the probability current of the CFPE vanishes at rii = ^]. In such a case the stationary 
solution of the CFPE has the form of Eq. (71) with K2 = and Ki is found by requiring 
that the solution is normalized on (0, 00). The exact errors computed with this new solution 
of the CFPE are practically indistinguishable from the previous solutions shown in Fig. 
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1 except for a small discrepancy at nooE = 3. The excellent agreement of our theoretical 
solution with both CFPE solutions is simply due to the fact that the probability of rii taking 
negative values in Eq. (71) is very small, unless Uode is also very small. 



B. Enzyme catalysis: the Michaelis-Menten mechanism 

As a second application, we consider the catalysis of a substrate species S into a product 
species P by an enzyme species via the Michaelis-Menten mechanism 3J] : 



0-^5, S + E^C, (81) 
fci 

C %E + P, (82) 

where E denotes the free enzyme, i.e. when it is not bound to substrate, and C denotes the 
substrate-enzyme complex. We will denote substrate, complex and free enzyme as species 1, 
2 and 3 respectively. Note that the product species is missing from the kinetic description 
because it is a byproduct of the reaction and thus not involved in the reactions. The total 
enzyme concentration is a constant, 02 + 03 = {^2/^) + (^3/^) = Et, since the enzyme 
is either bound to substrate or unbound. Hence we effectively have a two variable system. 
The reaction system exhibits a steady-state in the concentrations of substrate and complex 
whenever the inequality km < k2ET is satisfied, i.e. when the rate at which substrate is 
pumped into the system is less than or equal to the maximum rate at which the enzyme can 
convert substrate to product. Assuming such conditions, our aim is to calculate the relative 
errors in the mean and variance predictions of the CFPE, i.e. Eqs (55) and (57); to achieve 
this, we will first need to solve Eqs. (48)-(50), which we show in detail now. 

The stoichiometric matrix and the macroscopic rate function vector follow directly from 
their definitions (see Introduction): 



1-11 
1-1-1 



, /(01, 02) = {kin, ko{ET - 02)01, A;i02, A;202}. 

The rate equations and the D and J matrices follow by inserting the above in Eq. (1), Eq. 
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(16) and Eq. (17) to obtain: 

01 = ^M^-^, h = Et{1 - /3), (83) 

Jl = -ko{ET - 02), Ji =ki + /co(/.i, 4 = - Ji, J| = -{ki + k2 + ko(j)i) (84) 

J;J^^ — — = = 0, — Jf^ — —J^ — —J^ — ko, (85) 

Dui = D222 = 0, (86) 

1^112 = D121 = D211 = -Du2 = -D212 = -D221 = Eriki + k2)v{l - P), (87) 

Dn = D22 = 2ET{ki + k2){l - /3), = D21 = ^t(A;i + A;2)(l - /3)(7; - 2), (88) 

where (3 = 1 — kin/{k2ET), Km = (fci + k2)/ko is the Michaehs-Menten constant and rj = 

l-k,/{koKM). 

Note that /3 is a measure of enzyme saturation since as the input rate of substrate, 
kin, approaches the maximum rate at which the enzyme can catalyze the reaction, k2ET, 
the proportion of enzyme in complex form increases accordingly, as can also be seen from 
Eq. (83). Note also that 77 is a measure of how far is the system from equihbrium. This is 
since if substrate binding would occur at equilibrium, i.e., k^ = k2 = 0, then the relationship 
between the macroscopic concentrations would be 0103/02 = ^1/^0 while generally in steady- 
state conditions, i.e. k^n > 0, A;2 > 0,/3 < 1, the relationship between the macroscopic 
concentrations is 0103/02 = Km- Both (3 and 77 are non-dimensional, positive fractions. 

Substituting Eqs. (84-87) in Eqs. (48-50), setting the time derivative to zero and solving 
the resulting set of simultaneous equations we obtain: 

A _ -(l-/3)/3r/ 
' 2KMil+uf3^f + ETf3^{l + uf3^)r]' ' ' ^ ^ 

A12 - A21 - -A22 - -2(rTWTW+^' ^ ^ 

where u = Et/Km- 

The leading order term of the relative errors in the mean concentrations of substrate and 
complex, as predicted by the CFPE, are then given by substituting Eq. (83) together with 
Eq. (89) in Eq. (55): 

rpl _ ~P'^V p2 _ Q /q2\ 

— KlQ^{l + u(3^){2 + uP^{4 + P{2u(] + 7]))y ~ 
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Note that the relative error in the substrate concentration is always negative, i.e., the CFPE 
overestimates the mean substrate concentrations and it increases with the distance from 
equilibrium, 77. There is no error in the CFPE estimate for enzyme concentration (at least 
to order Q^"^). 

To calculate the relative errors in the variance using Eq. (57) we first need to compute 
the variance as estimated by the linear-noise approximation. This is obtained by solving 
Eq. (23) using Eq. (84) and Eq. (88): 

2 _ KM{l-m + uf3^ + il3-^)H 2 _ ET{l-f3)f3{l + uf3) 

Finally substituting the above two equations and Eqs. (90-91) in Eq. (57) we obtain 
the leading order term of the relative errors in the variance of the substrate and complex 
concentration fluctuations, as predicted by the CFPE: 

= ^ (q^) 

'"^^ KMn{l + uf3){2 + uf3^{4 + f3{2uf3 + r]))y ^ ^ 

Note that both relative errors are always positive implying that the CFPE underestimates 
the variance. 

We can now use the formulae given by Eq. (92), Eq. (94) and Eq. (95) to estimate the 

relative error of the CFPE when modeling conditions typical of the intracellular environment. 

A principal characteristic of such an environment is that the number of molecules of some 

species can be quite small. A detailed protein abundance profiling of the Escherichia coli 

I I 

cytosol by Ishihama et al [25] shows that the total number of enzyme molecules per cell 
approximately varies from a hundred to a few thousands. It is indeed in this limit of small 
numbers that it is frequently thought that the CFPE and the CLE description are not very 
accurate. We quantitatively test this hypothesis using our formulae. We will first express 
our error formulae in terms of the average number of molecules of substrate and free enzyme 
as predicted by the REs, i.e., Ui^ode = <Pi^ and ti^^ode = 03^- Using Eqs (83) we find that: 

Km^ = ' n , u = — . (96) 

1 - P rii^oDE 

Substituting Eq. (96) in Eq. (92), Eq. (94) and Eq. (95) we get expressions for the errors 
in terms of ui ode, n-i,ODE^ P and rj. Given fixed molecule numbers, ui^ode and u^ ode, we 
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can find the maximum error by varying /3 and rj over their allowed range [0, 1]. Repeating 
this procedure for various molecule numbers we can obtain simple two dimensional plots 
of the maximum error. The results for the maximum relative error in the predictions of 
the variance are shown in Fig. 2. The results verify that the predictions of the CFPE 
become increasingly accurate with increasing molecule numbers. They also show that the 
error incurred by using the CFPE for cases of small molecule numbers is very small: less 
than 1% for a few tens of molecule numbers. 

It is noteworthy that this accuracy is far better than even that hypothesized by proponents 
of the CFPE jlj]. For example Gillespie in his seminal paper on the derivation of the CLE 
{l4 | remarks in his conclusion that the CLE (and hence the CFPE) approximation is probably 
not a good one when one models a system composed of three time- varying species with total 
molecular population of 2000 since it appears quite possible that the molecule number of at 
least one of the species becomes significantly small at some point in time. In contrast our 
theory seems to predict that the CFPE predictions will still be very accurate even when the 
molecule numbers are quite low. 

We have tested these predictions by numerically solving the CLE for the Michaelis-Menten 
process using the Euler-Mayurama method to obtain the mean substrate concentrations and 
the variance of the substrate fluctuations about the means. The same were obtained from 
stochastic simulation algorithm simulations of the CME. The results are shown in Fig. 3. 
The parameters are chosen to be ko = 272, ki = 8, k2 = 60, Et = 100, k^ = 5880 and 
Q = 25 since this gives conditions similar to those mentioned by Gillespie above. The RE 
solutions, Eqs. (83), with the above parameters lead to 0i = 12.25, 02 = 98 and 03 = 2 
which, given a volume of i7 = 25, would imply tii^ode = 306.25, n2^oDE = 2450 and 
ns^ODE = 50. The total molecular population of enzyme (free plus complex form) is 2500 
molecules. Each algorithm (Euler-Mayurama and stochastic simulation algorithm) was run 
5 times leading to 5 independent estimates j35|. Note that even though the mean number of 
free enzyme molecules is considerably low, the predictions of the CFPE for both the mean 
and the variance agree (within sampling error) with those of the CME. For comparison we 
have also plotted the predictions of the linear-noise approximation (red lines) and of the 
mean concentration as predicted by the Effective Mesoscopic Rate Equation Eq. (60) (blue 
line). The results clearly confirm that the CFPE is more accurate than the linear Fokker 
Planck equation associated with the linear-noise approximation and that indeed the mean 
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concentrations of the CFPE are in excellent agreement with the Effective Mesoscopic Rate 



Equations derived in Ref 



Menten reaction was first obtained in Ref 



The Effective Mesoscopic Rate Equation for the Michaelis- 
(See Eq. (29) in the latter reference). 



For our set of parameters, the theoretical expressions, Eqs. (92) and (94), evaluate to 
^mean — ~2.9 X 10~^ and El^^ = 3.2 X 10^^; these errors are so small that they are clearly 
masked by the sampling error inherent in the calculation of the mean and the variance from 
the long-time simulation trajectories. Indeed, in agreement with our theory, from Fig. 3 one 
can detect no significant difference between the CFPE and CME predictions. The numerical 
experiments were performed with various other parameter sets - in all cases we could not 
detect any discrepancy between the CFPE and CME predictions within sampling error. 



VI. DISCUSSION AND CONCLUSION 



Summarizing, in this article we have shown that (i) the mean and variance predictions of 
the CFPE are accurate to order Since those of the linear Fokker-Planck equation are 

accurate to order for the mean and for the variance, it is clear that the CFPE is 

generally more accurate than the linear Fokker-Planck equation or equivalently the linear- 
noise approximation, (ii) for detailed balance conditions, the predictions of the CFPE are 
even more accurate, order i.e. in equilibrium or near equilibrium conditions the CFPE 
does an excellent job of approximating the CME. (iii) accuracy to such high order in inverse 
powers of the system volume implies that the CFPE estimates should be quite good even 
for small populations of molecules. Our simulations for dimerization and enzyme- catalyzed 
reactions support these theoretical conclusions, with impressively good agreement down to 
an average of 5 molecules for the dimerization example. 

The CFPE's accuracy is indeed surprising given that it arises out of a naive truncation of 
the Kramers- Moyal expansion of the CME and that the CFPE cannot be obtained from the 
systematic system-size expansion of the CME. Only the linear Fokker-Planck equation can 
be derived from the latter expansion by considering terms of order Q^. This equation leads to 
mean and variance estimates which are accurate to orders fi"^/^ and Now if one wants 

more accurate estimates one needs to consider higher-order terms in the expansion. To get 
mean concentration estimates to order one needs to consider the term in the system-size 
expansion proportional to j28|. To this order, one does not obtain the CFPE, rather 
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one obtains a partial differential equation with a third-order derivative. However, it turns 
out that the mean calculated from this equation precisely agrees with that calculated from 
the CFPE to order Q^^. If we even wanted to get more accurate means and variance, say 
both to order we need to consider terms in the system-size expansion to order 
This leads to a partial differential equation for the time evolution of the probability density 
function with derivatives as high as fifth order. Once again this is not the CFPE. However 
under steady-state conditions obeying detailed balance, the estimates from this high-order 
differential equation and the CFPE exactly agree to order Hence we have shown that 
though it is true that the CFPE does not arise out of the system-size expansion, nevertheless 
its predictions are better than those which can be obtained by considering only the first term 
of the expansion (the linear- noise approximation) as is conventional It follows that the 
non-linear character of the CFPE is not completely spurious as originally suggested by van 
Kampen 

Our study is the first one to our knowledge which systematically analyzes the validity of 
the non-linear multivariate CFPE and which derives approximate expressions for the size of 



the errors in the CFPE estimates - previous studies 37|, 1381] have focused on the CFPE for 



unimolecular reactions and for unimolecular and bimolecular reactions involving one species 



29| . Our analysis is based on the system-size expansion and thus has the same limitations. 



namely that it is only applicable for chemical systems which are "asymptotically stable in 
the sense of Lyapunov" . This implies that from our analysis we cannot draw any conclusions 
for bistable systems [sj. Within these constraints, the system-size expansion is a legitimate 



means of obtaining the moments of the CME accurate to any desired order [5[ . A few authors 



13| have expressed reservations regarding the accuracy of the expansion beyond the linear- 



noise level, their reasoning stemming from the fact that Pawula's theorem j39j states that 
a time-evolution equation for a probability density function with higher than second-order 
derivatives cannot describe a stochastic process. However these misgivings are undue - the 
higher-order partial differential equation stemming from the expansion truncated to some 
order is "not an exact equation for a Markov process that in some way approximates the 



original process; rather it is an approximate equation for the exact P." 



This statement 



of van Kampen is generally true for any legitimate expansion of the CME, not only the 



system-size expansion; for example Risken and VoUmer 



40| showed that taking into account 



higher-order derivatives than two in the Kramers- Moyal expansion of the CME also leads to 
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more accurate solutions than if one just had to use the CFPE. The accuracy of the system- 
size expansion beyond the hnear- noise approximation has also been verified by many recent 
studies 
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36 



41 



-441]. putting at rest any small doubts about its general validity. Finally, 
the good agreement of our theoretical expressions for the errors with simulations is a clear 
indication of the soundness of our system-size expansion based approach. 

Concluding our results offer theoretical and numerical support for Gillespie's hypothesis 
[l^ regarding the validity of the CFPE in both mesoscopic and macroscopic systems. Our 
formulas provide a simple means to estimate the error in the predictions of the CFPE and the 
associated CLE and hence should be of wide applicability to both theoretical and numerical 
studies of stochastic chemistry. 
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Appendix A: Subtleties of the perturbative expansion in the probability density 

By the normalization condition and the expansion of Il{e,t) we have: 

/OO p 
U{e,t)de = 1 = 5^/ Ilj{e,t)n-^/^de. (Al) 

Equating powers of the volume we obtain: 

Uode = 1, (A2) 
njde = 0,yj>l (A3) 



/ 



A„ analogous property has been discussed by Gardurer „r the d,flere,rt though related context 

of small noise expansions of the Fokker-Planck equation [8]. The two properties above are 
useful in the computation of the integrals needed to arrive to Eqs. (23-32); for more details 
see Appendix B. 

It follows from Eqs. (A2-A3) that only IIo is a genuine probability density while the 
higher orders are negative in some regions of the e space. From Eq. (15), we see that to 
order Q^, the time-evolution of IIo is given by a linear Fokker-Planck equation: 

^^^^ = -Jrd.ieM + ^Ap^jHo, (A4) 
32 



which again verifies that Ho is a probabihty density. However the time-evolution equations 
for Uj where j > 1, involve derivatives of order larger than two and hence by Pawula's 



theorem [39f| 11^ cannot be genuine probability density functions. 

The above arguments also imply that it is not correct to think of [ekem--^r]j, where j > 1, 
as genuine statistical moments; rather they are best considered as placeholders or labels for 
the associated integrals / ekem--€r^jde. In the main text we refer to them as corrections to 
the moments to order fi"-'/^. It is however important to bear in mind that though [ekem--£r]j 
are generally not true statistical moments, their linear superposition via Eq. (19) is a genuine 
statistical moment. Hence it is best to avoid associating any physical meaning to [ekem--^r]j 
and to simply regard them as a means to obtain the desired answer, i.e., (efcem..er). 

Appendix B: Detailed derivation of the time-evolution equations for [erefc]2 

The time-evolution equations are obtained by substituting Eq. (18) in Eq. (15), multiply- 
ing the resulting equation on both sides by er-efc and integrating over de. Finally we equate 
terms of order on both sides of the equation to obtain the time-evolution equation for 
[erefc]2. The right hand side of the resulting equation simplifies by performing integration 
by parts; there are 8 integrals which need such evaluation and we treat each one of them 
below. 

1. 

j erekdi{e^Il2)de = -J]^ J e^ull2[ek6i^r + (^A,k]de 

= -j:M2-jn^v.erh. (Bi) 

Note that in Eq. (15) we are summing over all twice repeated indices, which for the 
above integral are i and w. Use was made of this implicit summation on i in the 
derivation of the last step. 



= Dip{5p^rk,k + ^p,kk,T) j Hsc/e = 0. (B2) 
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6. 



7. 



In the last step, we have made use of the fact that J Il2de = 0, as shown in Appendix 
A. 



— ~Ji"^{[^w^p^r]l^i,k + [^w^p^k]l^i,r) 

— ~Jk^[^w^p^r]l — J^^i^w^p^kll- (B3) 

= -J^^'^\[er]lSi^k + [ek]lSi,r) 

--Jf%r]l-Jr^%k]l. (B4) 

J^p J ^r^kdip{eyjIli)d€^-J^p J dp{e^Ui)[ekdi^r + ^A,k]de 

= Jj^[^w]l{^p,rSi,k + Sp,kSi,r) 

= 2J- [6^]i. (B5) 

In obtaining the last step we have used the implicit summation over i and p and also 
the symmetrical property, J"^^ = J^, which follows from the definitions given by Eqs. 
(16-17). 

J erekdi{ewno)de = - J^^^^ J e^HolekSi^r + ^rSi^kW 

— ~ Ji''''^\[^w^k]oSi,r + [^w^r]oSi,k) 

^-J:^'\e^ek]o-J:^'\e^er]o. (B6) 
J^p"^ J erekdip{e^em^o)de^-J^p'^J dp{ey,em^o)[ekSi,r + erSi,k\de 

— Ji^'^''[^w^m]o{Sp,rSi,k + ^p,k^i,r) 

— '^Jk'Ti^w^mlo- (B7) 
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Note that in the last step we have used the symmetrical property, JJ^^ — J^"*, which 
follows from the definitions given by Eqs. (16-17). 



>^?^^ J er.ekdip{Uo)de = -J'^p'^^ J dpUo[ekSi^r- + ^A,k]de 

= ji-'\Sp,rkk + Sp,k5i,r) j ^ode^ 2Jl^'\ (B8) 

In the last step, we have made use of the fact that J Uode = 1, as shown in Appendix 
A and the symmetry property used in the evaluation of the previous integral. 
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(a) (b) 

FIG. 1: Dependence of the absolute value of the relative errors in the CFPE prediction of the 
mean, l-E'meanl and variance, l-E'-ya^'l, with the steady-state number of molecules, tIq^^q^ as estimated 
by the rate equations. The red open circles show the errors computed using the exact solutions of 
the CFPE and the CME. The blue lines denote the leading order errors estimated by our theory 
and given by Eq. (78) in (a) and Eq. (80) in (b). Note that the leading order error estimates are 
in good agreement with the errors calculated from the exact solutions. Note also that the error 
made by the CFPE increases with decreasing molecule numbers and that the error in the variance 
is considerably larger than that in the mean, in many cases by more than one order of magnitude. 
See text for details and discussion. 
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FIG. 2: Comparison of the predictions of the CFPE and CME for the Michaelis-Menten reaction 
mechanism. The differences between the two are quantified by calculation of the percentage relative 
error, i.e. 100 x (prediction of CME - prediction of CFPE) / prediction of CME. Panels (a) and 
(b) show the maximum percentage relative error in the CFPE predictions of the variance of the 
substrate and complex concentration fluctuations, respectively. The figures are generated using 
Eqs. (94) and (95) together with Eq. (96); see text for details. The errors increase with decreasing 
molecule numbers; the magnitude of the error is very small in all cases implying that the CFPE is 
a highly accurate approximation of the CME. 
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FIG. 3: Comparison of the predictions of the CLE for mean substrate concentration and variance of 
the fluctuations about the mean, with the predictions of the CME, the hnear-noise approximation 
(LNA) and the mean concentration as predicted by Effective Mesoscopic Rate Equations (EMRE). 
Note that the CLE, within statistical error, is in agreement with the CME. The CLE predictions 
are more accurate than those obtained from the hnear-noise approximation. The mean substrate 
concentration of the CLE agrees very weh with the predictions of EMRE, Eq. (60). See text for 
details. 
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